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(57) A computer system (1) for analyzing nucleic 
acid sequences is provided. The computer system is 
used to perform multiple methods for determining 
unknown bases by analyzing the fluorescence intensi- 
ties of hybridized nucleic actd probes. The results of indi- 
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nucleic add sequences together. Comparative analysis 
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10 GOVERNMENT RIGHTS NOTICE 

Portor^ofthernater^linthlsspe^r,.^^^^^^^^^ 
Affymetrix. inc. and the Departmertt of Energy and/or H60081 3 1. 2Det«ee 

of Health. 

BACKGROUND OF THE INVENTION 

Mnro cnecrficalW the present invention relates to 
The present invention relates to the field of conput^ St^or^X^'comp^ring biological sequenc^. 
computer systems for visualizing biolog.ral ^^^^ * 'Ji of n^Ss on a substrate are known. For e«mple. 

Devices and computer systems for forming and "^^""^^"^^^^^ for all purposes, describe techniques 
PCT applications WO92/10588 and ^^/^ f 5. -r^'rpora^^^^^ perfoLSTthese operations may be 

for sequencing or sequence checking nucteK: acKls i^^'.jf ^^'^ering technfciues disclosed in U.S. Patent Nos. 

-^^ingtooneaspe...ete^n,u.de^^^^^^^^^^ 
locations onachip or substrate.AlabelednudeKacrfBfl^enb^^^^ 

a^mage file (also called a cell f il^ •'"^''=^"9^^^^°;^^^^^^^ to extract information such as 

upon the imagefile and idertities of the probes at^.clo^on^^^^ 

other genetic characteristics. ^, process the vast amount of infor- 

Inproved computer systems and methods a e neeoM ra wa, 

matioTnow used and made available by these pioneering technologies. 
SUMMARY OF THE INVENTION 

Tb. c»rpul» ^ praxes, anwi *»"«^ "J^^S,lcl*«*l »»«no». 

acid sequence by the steps o1: 

. inputtingmu«..eprobeintens.es.eacho,thep^^^^^^^ 

callingtheunknownbaseaccordlng^^ 

According to one specific aspect of ^^^^^^^'^^'^^^^ probe intensities of a sample sequence 

call the unknown base. According to «'«f^^„^f/'^„^^°'^r^^^^^^ another specif k= aspect of the invention. 

rr=s^s?s^;Curers:» 

"^"'S^S^jXomer aspect Of the inven^o^amethod is d^^^^ 
acd tSSSs to reduce the variations between the experiments by the steps of. 

- providing a plurality of nucleic acid probes; . 
: iabeling the reference nucleic add sequence with a "rst marker^ 
. ISllngthesamplenudeicacidsequencewW^ 
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hybridizing the labeled reference and sample nucleic acid sequences at the same time. 

According to another aspect of the invention, a computer system is used to Identify mutations in a sample nucleic 
acid sequence by the steps of: 

5 - inputting a first set of probe intensities, each of the probe intensities in said f isrt set being associated with a nucleic 
acid probe and substantially proportional to the associated nucleic acid probe hybridizing with a reference nucleic 
acid sequence; 

inputting a second set of probe intensities, each of the probe intensities in said fisrt set being associated with a 
nucleic acid probe and substantially proportional to the associated nuclec acid probe hybridizing with said sample 
10 sequence: 

- the computer system comparing probe intensities in the first set to probe intensities in the second set to select 
hytxidization regions where the probe intensities in the first and second sets differ; and 

identifying mutations according to characteristics of the selected regions. 
15 According to yet another aspect of the invention, a computer system is used for comparative analysis and visuali- 
zation of multiple sequences by the steps of: 

* displaying at least one reference sequence in a first area on a display device; and 

- displaying at least one sanple sequence in a second area on said display device; 

20 

whereby a user is capable of visually comparing the multiple sequences. 

A further understanding of the nature and advantages of the inventions herein may be realized by reference to the 
remaining portions of the specification and the attached drawings. 

25 BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 illustrates an example of a computer system used to execute the software of the present invention; 

Fig. 2 shows a system block diagram of a typical computer system used to execute the software of the present 

invention; 

30 Fig. 3 illustrates an overall system for forming and analyzing arrays of biological materials such as DNA or RNA; 

Fig. 4 is an illustration of the software for the overall system; 

Fig. 5 illustrates the global layout of a chip formed in the overall system; 

Fig. 6 illustrates conceptually the binding of probes on chips; 

Fig. 7 illustrates probes arranged in lanes on a chip; 
35 Fig. 8 illustrates a hybridization pattern of a target on a chip with a reference sequence as in Fig. 7; 

Fig. 9 illustrates the high level flow of the intensity ratio method; 

Fig. 1 0A illustrates the high level flow of one inrplementation of the reference method and Fig. 1 0B shows an analysis 
table for use with the reference method; 

Fig. 1 1 A illustrates the high level flow of another implementation of the reference method; Fig. 1 1B shows a data 
40 table for use with the reference method; Fig. 1 1 C shows a graph of the normalized sample base intensities minus 
the normalized reference base intensities; and Fig. 1 1 D shows other graphs of data in the data table; 
Fig. 12 illustrates the high level flow of the statistical method; 

Fig. 13 illustrates the pooling processing of a reference and sample nucleic add sequence; 
Figs. 14A and 14C show graphs of scaled fluorescent intensities of wild-type probes hybridizing with sample and 
45 reference sequences and 1 48 shows a hypothetical graph of fluorescent intensities of wild-type probes hybridizing 
with two sample sequences and a reference sequence; 

Fig. 15 illustrates the high level flow of an embodiment that uses the hybridization data from than one base position 
to identify mutations in a sample sequence; 

Fig. 16 illustrates the main screen and the associated pull down menus for conrparative analysis and visualization 
50 of multiple experiments; 

Fig. 17 illustrates an intensity graph window for a selected base; 
Fig. 18 illustrates nrtultiple intensity graph windows for selected bases; 

Fig. 19 illustrates the intensity ratio method correctly calling a mutation in solutions with varying concentrations; 
Fig. 20 illustrates the reference method correctly calling a mutant base where the intensity ratio method incorrectly 
55 called the mutant base; and 

Fig. 21 illustrates the output of the ViewSeq™ program with four pretreatment samples and four posttreatment sam- 
ples. 
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DESCRIPTION OF THE PREFERRED EMBODIMENT 
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1. \jenereii - I INIIY 

environment. The present invention, however, is not l.tnrted to any Pa^«c" be advantageously 

r^,thoses..ledintheart^..fujJth^^^^^ 

applied to a variety of systems, indud^ IBM P^sora^ c^^ 

thTfollowing description of specific systems «;f/°^P" J'^^^j'^^^ the software of the present invention. Rg. 1 
Fig. 1 illustrates an example of a computer s^temus^ to ^ keylX)a.d9.andmouse11.Mouse11 may 

shows'a computer system 1 which includes ^ 3^-g^;^'7Csi a fltwy disk drive U and a hard drive (nc. 

rc::;Ltry.^rt?= 

and the like. ^. , ^ ^ ^sed to execute the software of the present invertborL 

Rg. 2 shows a system block diagram of "/^Jf^^^'Jg computer system 1 further includes sul«yslems 
AS in Fig. 1 . computer system 1 indudes serial port62. disk64, network 

such asacentral processor 52. systemmem«yMJ^<»n^^^ 

interface 66. and speaker 68. Disk -^'^P^^^^^t^t for use with the present invention may |™:lude 

si'^rrsrser^:-^^^^^^^^ 

trative of any interconnection scheme serving to link ^^^r^'^^^ ^ ^ertral processor 52. Computer system 1 
tS^cJier suUtems through a port or have an imer^^^^^^ 

shown in Fig. 2 is but an example of a computer V^^^^^^^^^^^^^ fo one of ordinary skill in the art. 

Of subsystems suitable for use with the prese^^^^ 

The VLSIPS™ technology P^o^''^ l^f^^Lt^^SSn Nos WO 90/15070 and 92/10092. each of which ^B 
chp^ See U.S. Patent No. 5.143.854 ^^^^l!^'^^^^ on the DNA probe array are used to detect 
incorporated by reference for all purposes. TTie o>'gonucl««iae . 3^^). 

S^mentarJ nucleic add ^^^f -^^^^J^^^S^^^ for a chip containing hybrd'^ed 

to analyze other measurements of hybridization _ ^ ^ ^^e, system that designs a 

Generated by such systems. ^r^^i^ymn arravs of biological materials such as RNA or 

' fJI illustrates a computerized system fo^,^.^^;"^^"^^^^'"^^^^ The computer 100 may be 

DNA I computer 1 00 is used to design «';,«^;'^^'Sin't'J^^ or workstation, such as an IBM PC 

tor example, an appropriate^ P''«^^'"^„^"J^^p^^ sh^Tpigs. 1 and 2. The computer system 100 ol>te.ns 
equivalent, including appropriate memory « ^^^^^ ff^^^ 
Suts from a user regarding characteristics gene c^^^^^^ 



4 



EP0 717113A2 



computer files 1 04 in the form of, for example, a switch matrix, as described in PCT application WO 92/1 0092, and other 
associated computer files. 

The chip design files are provided to a system 106 that designs the lithographic masks used in the fabrication of 
arrays of molecules such as DNA. The system or process 106 may indude the hardware necessary to manufacture 

5 masks 11 0 and also the necessary computer hardware and software 1 08 necessary to lay the mask patterns out on the 
mask in an efficient manner. As with the other features in Fig. 3, such equipment may or may not be located at the same 
physical site, but is shown together for ease of illustration in Fig. 3. The system 106 generates masks 110 or other 
synthesis patterns such as chrome-on-glass masks for use in the fabrication of polymer arrays. 

The masks 110. as well as selected information relating to the design of the chips from system 100, are used in a 

10 synthesis system 112. Synthesis system 1 1 2 includes the necessary hardware and software used to fabricate arrays of 
polymers on a substrate or chip 114. For example, synthesize- 112 includes a light source 1 16 and a chemical flow cell 
1 1 8 on which the substrate or chip 1 1 4 is placed. Mask 1 1 0 is placed between the light source and the substrate/chip, 
and the two are translated relative to each other at appropriate times for deprotection of selected regions of the chip. 
Selected chemical reagents are directed through flow cell 1 1 8 for coupling to deprotected regions, as well as for washing 

75 and other operations. All operations are preferably directed by an appropriately programmed computer 119, which may 
or may not be the same computer as the computer(s) used in mask design and mask making. 

The substrates fabricated by synthesis system 1 12 are optionally diced into smaller chips and exposed to marked 
receptors. The reciters may or may not be complementary to one or more off the molecules on the substrate. The 
receptors are marked with a label such as a fluorescein label (indicated by an asterisk in Fig. 3) and placed in scanning 

20 system 1 20. Scanning system 1 20 again operates under the direction of an appropriately programmed digital computer 
122, which also may or may not be the same conrputer as the computers used in synthesis, mask making, and mask 
design. The scanner 120 includes a detection device 124 such as a confocal microscope or CCD (charge-coupled 
device) that is used to detect the locations where labeled receptor {*) has bound to the substrate. The output of scanner 
120 is an innage file(s) 124 indicating, in the case of fluorescein labeled receptor, the fluorescence intensity (photon 

25 counts or other related measurements, such as voltage) as a function of position on the substrate. Since higher photon 
counts will be observed where the labeled receptor has bound more strongly to the array of polymers, and since the 
monomer sequence of the polymers on the substrate is known as a function of position, it becomes possible to determine 
the sequence(s) of polymer(s) on the substrate that are complementary to the receptor. 

The image file 124 is provided as input to an analysis system 126 that incorporates the visualization and analysis 

30 methods of the present invention. Again, the analysis system may be any one of a wide variety of computer syslem(s), 
but in a preferred embodiment the analysis system is based on a Sun Workstation or equivalent. The present invention 
provides various methods of analyzing the chip design files and the image files, providing appropriate output 128. The 
present invention may further be used to identify specific mutations in a receptor such as DNA or RNA. 

Fig. 4 provides a simplified illustration of the overall software system used in the operation of one embodiment of 

35 the invention. As shown in Fig. 4. in some cases (such as sequence checking systems) the system first identifies the 
genetic S6quence(s) or targets that would be of interest in a particular analysis at step 202. The sequences of interest 
may, for example, be normal or mutant portions of a gene, genes that identify heredity, or provide forensic information, 
or be all possible n-mers (where n represents the length of the nucleic acid). Sequence selection may be provided via 
manual input of text files or may be from external sources such as QenBank. At step 204 the system evaluates the gene 

40 to determine or assist the user in determining which probes would be desirable on the chip, and provides an appropriate 
"layout" on the chip for the probes. The chip usually includes probes that are complementary to a reference nucleic acid 
sequence which has a known sequence. A wild-type probe is a probe that will ideally hybridize with the reference 
sequence and thus a wiki-type gene (also called the chip wild-type) would ideally hybridize with wiki-type probes on the 
chip. The target sequence is substantially simitar to the reference sequence except for the presence of mutations, inser- 

45 tions, deletions, and the like. The layout implements desired characteristics such as anangement on the chip that permits 
"reading" of genetic sequence and/or minimization of edge effects, ease of synthesis, and the like. 

Fig. 5 illustrates the global layout of a chip in a particular embodiment used for sequence checking applications. 
Chip 1 14 is conposed of multiple units where each unit may contain different tilings for the chip wild-type sequence. 
Unit 1 is shown in greater detail and shows that each unit is composed of multiple cells which are areas on the chip that 

50 may contain probes. Conceptually, each unit is composed of multiple sets of related cells. As used herein, the term cell 
refers to a region on a substrate that contains many copies of a molecule or molecules of interest. Each unit is composed 
of multiple cells that may be placed in rows (or "lanes") and columns. In one embodiment, a set of five related cells 
includes the following: a wild-type cell 220, "mutation" cells 222, and a "blank" cell 224. Cell 220 contains a wild-type 
probe that is the complement of a portion of the wild-type sequence. Cells 222 contain "mutation" probes for the wild- 

55 type sequence. For example, if the wild-type probe is 3'-ACGT. the probes 3'-ACAT, 3 -ACCT 3'-ACGT and 3'-ACTT 
may be the "mutation" probes. Cell 224 is the "blank" cell because it contains no probes (also called the "blank" probe). 
As the blank cell contains no probes, labeled receptors should not bind to the chip in this area. Thus, the blank ceil 
provides an area that can be used to measure the background intensity 
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„ MO.. O. o«.,se . «de r»«e 0. sL* in an a»..-» "f" 

the wild-type target sequence is 5 -ACGWGCA 3 " '^^ ?^ . . ,„ synthesized with a "4x3 tlmg strategy, 
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Table 1 


Probe Sequences (From 3'-end) 4 


X3 Opt-Til 


ng 


Wild 


TGCA 


GCAT 


CATA 


ATAC 


TACG 


A sub. 


TGAA 


GCAT 


CAAA 


ATAC 


TAAG 


Csub. 


TGCA 


GCCT 


CACA 


ATCC 


TACG 


Gsub. 


TGGA 


GCGT 


CAGA 


ATGC 


TAGG 


Tsub. 


TGTA 


GCTT 


CATA 


ATTC 


TATG 


Wild 


TGCA 


GCAA 


CAAA 


AAAC 


AACG 


A sub. 


TGAA 


GCAA 


CAAA 


AAAC 


AAAG 


Csub. 


TGCA 


GCCA 


CACA 


AACC 


AACG 


Gsub. 


TGGA 


GCGA 


CAGA 


AAGC 


AAGG 


Tsub. 


TGTA 


GCTA 


CATA 1 AATC \ AATU. 
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In thP first "chD aoove, ine iuh - r ^ - . -r wggg mutation as su^hc^iow. " 

SXSe taSet DNA sequence, "o^-f' " J'^.^^tTriS column 3. For example the mumrrt 
mutant sequence will not bind that strongly to tt^e ^^^"^^^^ ^ ^ a^at strongly to any o the probes 

rr^b^s^rrrsS'^^^^^^^^^ 

results in a relatively dark scanned area around a mute^m_^ 

K-.«.,rthorrietecied. i.^i«noH At stM 208 the software utilizes the 



«)tor5-C(jil I SnOUiauiriuo"w"»-J 

be further detected. ^ u^wmp synthesis are designed. At step 208 the software utilizes the 

Again referring to Fig. 4. at step 206 t^e m^Ks fo^^^^^^ 'itrSer ch^- ™s software 208 will cootrd relatve 

6ogl»r«:««». ,. DNA 10 an »ray c« DN« probes 1W. As 
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3 • -AGAACGT 
AGACCGT 
AGAGCGT 
AGATCGT 



As shown, the set of probes differ by only one base so the probes are designed to determine the identity of the base at 

75 that position in the nucleic add sequence. 

When a f tuorescein-labeled (or otherwise marked) target with the sequence 5'-TGTTGCA is exposed to the array, 
it is complementary only to the probe 3'-AGAACGT. and fluorescein will be primarily found on the surface of the chip 
where 3'-AGAAGGT is located. Thus, for each set of probes that differ by only one base, the image file will contain four 
fluorescence intensities, one for each probe. Each fluorescence intensity can therefore be associated with the base of 

20 each probe that is different from the other probes. Additionally, the image file will contain a "Wank" cell which can be 
used as the fluorescence intensity of the background. By analyzing the five fluorescence intensities associated with a 
specific base location, it becomes possible to extract sequence information from such anays using the methods of the 
invention disclosed herein. 

Fig. 7 illustrates probes arranged in lanes on a chip. A reference sequence is shown with five inten^ogation positions 

25 marked with number subsaipts. An interrogation position is a base position in the reference sequence where the target 
sequence may contain a mutation or othenwise differ from the reference sequence. The chip may contain five probe cells 
that correspond to each inten'ogation position. Each probe cell contains a set of probes that have a common base at 
the interrogation position. For example, at the first interrogation position, h, the reference sequence has a base T. The 
wild-type probe for this interrogation position is 3'-TGAC where the base A in the probe is complementary to the base 

30 at the inten^ogation position in the reference sequence. 

Similarly, there are four "mutant" probe cells for the first interrogation position, h. The four mutant probes are 3'- 
TGAC, 3'-TGGC. 3'-TGGC. and 3'-TGTC. Each of the four mutant probes vary by a single base at the interrogation 
position. As shown, the wild-type and mutant probes are arranged in lanes on tine chip. One of tine mutant probes (in 
this case 3'-TGAC) is Identical to the wild-type probe and therefore does not evidence a mutation. However, the redun- 

35 dancy gives a visual indication of mutations as will be seen in Fig. 8. 

Still referring to Fig. 7, the chip contains wild-type and mutant probes for each of tine other interrogation positions 
Iris- In each case, the wild-type probe is equivalent to one of the mutant probes. 

Fig. 8 illustrates a hybridization pattern of a target on a chip with a reference sequence as in Fig. 7. The reference 
sequence is shown along the top of the chip for comparison. The chip includes a WT-lane (wild-type), an A-lane. a C- 

40 lane, a G-lane, and a T-lane (or U). Each lane is a row of cells containing probes. The cells in tine WT-lane contain probes 
that are complementary to the reference sequence. The cells in the A-, C-. G-, and T-lanes contain probes tinat are 
complementary to the reference sequence except that the named base is at the interrogation position. 

In one embodiment, the hybridization of probes in a cell is determined by the fluorescent intensity (e.g., photon 
counts) of the cell resulting from the binding of marked target sequences. The fluorescent intensity may vary greatly 

45 among cells. For simplicity, Rg. 8 shows a high degree of hybridization by a cell containing a darkened area. The WT- 
lane allows a simple visual indication tinat there is a mutation at interrogation position U because tine wild-type cell is not 
dark at tinat position. The cell in tine G-lane is darkened which indicates that the mutation is from T->G (mutant probe 
cells are complementary so the C-cell indicates a G mutation). 

In practice, the fluorescent intensities of cells near an interrogation position having a mutation are relatively dark 

50 creating "dark regions" around a mutation. The lower fluorescent intensities result because tine cells at inten'ogation 
positions near a mutation do not contain probes that are perfectly complementary to the target sequence; tinus, the 
hybridization of tinese probes with tine target sequence is lower. For example, the relative intensity of the cells at inter- 
rogation positions I3 and I5 may be relatively low because none of tine probes therein are complementary to the target 
sequence. 

55 
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call bases by assigning the bases the following codes: 



Code < 


3roup 


Meaning 


A 


A 


Adenine 


C 


C 


Cytosine 


Q 


Q 


Guanine 


T 


T(U) 


Thymine (Uracil) 


M 


AorC 


aMino 


R 


AorG 


puRine 


W 


AorT{U) 


Weak interaction (2 H bonds) 


Y 


CorT(U) 


pYrimidine 


S 


o or o 


Strong interaction (3 H bonds) 


K 


GorT(U) 


Keto 


V 


A,CorG 


notT(U) 


H 


A,CorT(U) 


notG 


D 


A.GorT(U) 


note 


B 


C,GorT(U) 


not A 


N 


A. C. G. orT(U) 


Insufficient intensity to call 


X 


A. C, G. orT(U) 


Insufficient discrimination to call 



Most of the codes conform to the lUPAC standa-d. Ho«e.er, code N has been redef hed 



and code X has been added. 



n intftnsitv R atio Method 

Theun]a«»mbaM«i>beldei««dby«rali««onolijpB»urm^ 
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have the same sequence as the target sequence except for a mutation of base C to base T at the underlined base 
position. Although hundreds of probes may be synthesized on the chip, the conplementary mutation probes synthesized 
to detect a mutation in the sample sequence at the suspected mutation position may be as follows: 

3'-TATC 

3'-TCTC 

3'-TGTC (wild-type) 
3'-TTTC 

The mutation probe 3'-TQTC is also the wild-type probe as it should bind most strongly with the target sequence. 

After the sample sequence is labeled, hybridized on the chip, and scanned, suppose the following fluorescence 
Intensities were obtained: 

3'-TATC -> 45 

3'-TCTC -> 8 

3'-TGTC -> 32 

3'-TTTC ->12 

where the intensity is measured by the photon count detected by the scanner. The "blank" cell had a fluorescence 
intensity of 2. The photon counts in the examples herein are representative (not actual data) and provided for illustration 
purposes. In practice, the actual photon counts will vary greatly depending on the experiment parameters and the scanner 
utilized. 

Although each fluorescence intensity is from a probe, the probes may be characterized by their unique mutation 
base so the bases may be said to have the following intensities: 
A-> 45 
C->8 
G-> 32 
T->12 

Thus, base A will be described as having an intensity of 45. which corresponds to the intensity of the mutation probe 
with the mutation base A. 

Initially, each mutation base intensity is reduced by the background or "blank" cell intensity. This is done as follows: 

A->45-2 = 43 
C->8-2=6 
G->32-2 = 30 
T-> 12-2 = 10 

Then, the base intensities are sorted in descending order of intensity. The above bases would be sorted as follows: 
A->43 
G -> 30 
T->10 
C->6 

Next, the highest intensity base is compared to the second highest intensity base. Thus, the ratio of the intensity of base 
A to the intensity of base G is calculated as follows: A:G = 43 / 30 = 1 .4. The ratio A:G is then compared to a predetermined 
ratio cutoff, which is a number that specifies the ratio required to identify the unknown base. For example, if the ratio 
cutoff is 1.2, the ratio A:G is greater than the ratio cutoff (1.4 > 1.2) and the unknown base is called by the mutation 
probe containing the mutation A. As probes are complementary to the sample sequence, the sample sequence is called 
as having a mutation T resulting in a called sample sequence of 5'-ATGTGQAEAGTTGTA-3' (SEQ ID N0:2). 

As aniDther example, suppose everything else is the same as in the previous example except that the sorted back- 
ground adjusted intensities were as follows: 

C->42 

A-> 40 

G->10 

T->8 

The ratio of the highest intensity base to the second highest intensity base (C : A) is 1 .05. Because this ratio is not greater 
than the ratio cutoff of 1 .2, the unknown base will be called as being ambiguously one of two or more bases as follows. 

The second highest intensity base is then compared to the third highest base. The ratio of A:G is 4. The ratio of A:G 
is then conpared to the ratio cutoff of 1 .2. As the ratio A:G is greater than the ratio cutoff (4 > 1 .2), the unknown base 
is called by the mutation probes containing the mutations C or A. As probes are complementary to the sample sequence. 
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10 



15 



20 



25 



30 



35 



numbers in future calculations. .^x^^eiK/ Ponh base is then associated with a number from 1 to 4. 

base. Thelntensit, Is chseted b/dasrmmrgjflhe '""^°'^^';j^e^,L«.W»>sit,™^ 

as shown at step 310. . . » « ^^jQ ^ -2 is greater than the ratio 

SflTs: the ratio 0, the imens^ t«se 2 to ba^^^^^^^ 

At step 318 the ratio of intensity of bases 2:3 « ^-^^^f ^L"^^ 'Jf^ of the highest or second 

code X (insufficient discrimination) as s^^^^n^t step 328^ disaimination between 

-me advantage of the intensrty ratio method js Iha^rt h w^t? rfSe base corresponding to a correct 

the fluorescence intensities of hybrid matches and hybrd m«j^^^ 

hybrid gives a lower intensity than a ^.s-n^^h (a^. « ^^,^^^^^5^ ,ssessrr!ent of hybrWization qualrty and 
base will result. For this reason, however, the mettiod is "^ej^ ^r cotrj^^^^^ ^ determine 

as an incScator of sequence-specific problem sj«ts.For^a^^^^^^ 



40 
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III Rpfprpnce Method 



The reference method is a method of calling bases In a -7'--'^" ^^^^^^ 
depends very little on discrimination '>«««-"j;^« """^^l^j: ^S^fe thT^c^i^-^^^ of a reference 

There are two implementations of t^'l^^'^^^^^^'^f^^ to identify one unknown base in a sample 

For simplidty, the reference '"f^^ ^^^^^^^^^^ bases in a nudeic acid sequence. 

n\=:e:iSedXr^^^^^^^^ 
?hrps^t;=c:^^^^^^^^ 

wild-type, as it may have mutations. coonpnces will each be associated wHh up to four 

Z bases at the same posWon in the reference ^'^^^P^.'^Jf/, ? i^^by comparing probe intensities 
mutation probes and a "blank" cell. The unknown J^^^^^^^S^eS^^^^ ^'^^ 
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position, which is the unknown base that will be called by the reference method. The "mutation" probes for the sample 
sequence may be as follows: 3'-GAAA, 3'-GCAA, 3'-GGAA, and 3'-GTAA, where 3'-GGAA is the wild-type probe. 

Suppose further that a reference sequence, which differs from the chip wild-type by one base mutation, has the 
sequence 5'-AGACATTGC-3' where the mutation base is underlined. The "mutation" probes for the reference sequence 

5 may be as follows: 3'-TGAAA. 3'-TGCAA, 3'-TGGAA, and 3'-TGTAA, where 3'-TGTAA is the reference wild-type probe 
since the reference sequence is known. Although generally the sample and reference sequences were tiled with the 
same chip wild-type, this is not required, and the tiling methods do not have to be Identical as shown by the use of two 
probe lengths in the example. Thus, the unknown base will be called by comparing the "mutation" probes of the sample 
sequence to the "mutation" probes of the reference sequence. As before, because each mutation probe is identifiable 

10 by the mutation base, the mutation probes' intensities will be referred to as the "base intensities" of their respective 
mutation bases. 

As a simple example of one implementation of the reference method, suppose a gene of interest (target) has the 
sequence 5'-AAAACTGAAAA-3' (SEQ ID N0:4). Suppose a reference sequence has the sequence 5'-AAAACCGAAAA- 
3' (SEQ ID N0:5), which differs from the target sequence by the underlined base. The reference sequence is marked 
15 and exposed to probes on a chip with the target sequence being the chip wild-type. Suppose further that a sample 
sequence is suspected to have the same sequence as the target sequence except for a mutation at the underlined base 
position in 5'-AAAACIGAAAA-3' (SEQ ID NO:4). The sanple sequence is also marked and exposed to probes on a 
chip with the target sequence being the chip wild-type. After hybridization and scanning, the following probe intensities 
(not actual data) were found for the respective complementary probes: 

20 



Reference 


Sanrple 


3'-TGAC-> 12 
3'-TGCC -> 9 
3*-TGGC -> 80 
3'-TGTC ->15 


3'-GACT ->11 
3'-GCCT -> 30 
3'-GGCT -> 60 
3'-GTCT -> 6 



30 

Although each fluorescence intensity is from a probe, the probes may be identified by their unique mutation base so the 
bases may be said to have the following intensities: 

35 



Reference 


Sample 


A->12 


A->11 


C->9 


C-> 30 


G-> 80 


G-> 60 


T->15 


T->6 



45 

Thus, base A of the reference sequence will be desaibed as having an intensity of 1 2, which corresponds to the intensity 

of the mutation probe with the mutation base A. The reference method will now be described as calling the unknown 

base in the sample sequence by using these intensities. 

Fig. 10A illustrates the high level flow of one implementation of the reference method. For illustration purposes, the 
so reference method is described as filling in the columns (identified by the numbers along the bottom) of the analysis table 

shown in Fig. 10B. However, the generation of an analysis table is not necessary to practice the method. The analysis 

table is shown to aid the reader in understanding the method. 

At step 402 the four base intensities of the reference and sample sequences are adjusted by subtracting the t)ack- 

ground or "blank" cell intensity from each base intensity. Each set of "mutation" probes has an associated "blank" cell. 
55 Suppose that the reference "blank" cell intensity is 1 and the sample "blank" cell intensity is 2. The base intensities are 

then background subtracted as follows: 
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10 



Reference 


Sample 


A->12-1 = 11 
C->9-1 =8 
G -> 80 - 1 = 79 
T->15-1 =14 


A->11 -2 = 9 
C -> 30 - 2 = 28 
G -> 60 - 2 = 58 
T -> 6 - 2 = 4 



15 



20 



25 



30 



35 



PreferaWyJfabaseimensity is then less than or equate zero, the base irtensity is set equ^toa 
to orevent division by zero or negative numbers In future calculations. 

Xr W^S^tion the position of each base of Interest in the reference and sample sequences « P'aced m oo^urnn 

*A? ST^A the base intensity associated with the reference wild-type (column 2 of the analysis table) fe checked 

'n^: SSui^' d"£e°::^'^^:iT5.'^^^^ intensity associated with the reference wjd-type has su^ciern 
interlL^CT^Ts^^P Si) is placed in column 3 of the analysis table 

"Ts?eLt?mer'Ltoi™el^se^^^ 
calculated: 

G:A->79/11 = 7.2 
G:C ■> 79/8 = 9.9 



40 



45 



50 



55 



G:G-> 79/79 = 1.0 
G:T->79/14 = 5.6 

"T^e;^'irtfe:arre^i^r^^^^^ 

ratlo^f iThigJ it tS^ntensity to ItseH will be 1 and the other ratios will usu^ly be greater than 1 . Thus, the h.ghest 
base intensity is G so the following ratios are calculated: 

G:A-> 58/9 = 6.4 
G:C->58/28 = 2.3 



G:G-> 58/58 = 1.0 
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G:T-> 58/4 = 14.5 

These ratios are placed in columns 9 through 12 of the analysis table, respectively. 

At step 416 if both the reference and sanple sequence probes failed to have sufficient intensity to call the unknown 
5 base, meaning there is an 'P in columns 3 and 8 of the analysis table, the unknown base is assigned the code N 
(insufficient intensity) as shown at step 41 8. An *N' is placed in column 1 7 of the analysis table. Additionally, a confidence 
code of 9 is placed in column 18 of the analysis table where the confidence codes have the following meanings: 



10 





Code 


Meaning 




0 


Probable reference wild-type 




1 


Probable mutation 


15 


2 


Reference sufficient intensity, insufficient intensity in sample suggests possible mutation 




3 


Borderline differences, unknown base ambiguous 




4 


Sample sufficient intensity, Insufficient intensity in reference to allow comparison 


20 


5-8 


Currently unassigned 




9 


Insufficient intensity in reference and sannple, no interpretation possible 



The confidence codes are useful for indicating to the user the resulting analysis of the reference method. 
25 At Step 420 if only the reference sequence probes failed to have sufficient intensity to call the unknown base, meaning 
there is an 'F' In column 3 and a 'P' in column 8 of the analysis table, the unknown base is assigned the code N (insufficient 
intensity) as shown at step 422. An 'N' is placed in column 17 and a confidence code of 4 Is placed in column 18 of the 
analysis table. 

At step 424 If only the sample sequence probes failed to have sufficient intensity to call the unknown base, meaning 
30 there is a 'P' in column 3 and a 'P in column 8 of the analysis table, the unknown base is assigned the code N (Insufficient 
intensity) as shown at step 426. An 'N' is placed in column 1 7 and a confidence code of 2 is placed in column 18 of the 
analysis table. 

In this example, both the reference and sample sequence probes have sufficient intensity to call the unknown base. 
At step 428 the ratios of the reference ratios to the sample ratios for each base type are calculated. Thus, the ratio A:A 
35 (column 4 to column 9) is placed in column 1 3 of the analysis table. The ratio C:C (column 5 to column 1 0) is placed in 
column 14 of the analysis table. The ratio G:G (column 6 to column 11) is placed in column 15 of the analysis table. 
Lastly the ratio T:T (column 7 to column 12) is placed in column 16 of the analysis table. These ratios are calculated as 
follows: 

40 A:A-> 7.2/6.4 = 1.1 

C:C->9.9/2.3 = 4.3 
G:G->1.0/1.0 = 1.0 

45 

T:T-> 5.6/ 14.5 = 0.4 

The unknown base is called by comparing these ratios of ratios to two predetermined values as follows. 

At step 430 if all the ratios of ratios (columns 13 to 16 of the analysis table) are less than a predetermined lower 
50 ratio cutoff, the unknown k>ase is assigned the code of the reference wild-type as shown at step 432. Thus, the code for 

the reference witd-type (as shown in column 2) would be placed in column 17 and a confidence code of 0 would be 

placed in column 18 of the analysis table. 

At step 434 if all the ratios of ratios are less than a predetermined upper ratio cutoff, the unknown base is assigned 

an ambiguity code that indicates the unknown base may be any one of the bases that has a complementary ratio of 
55 ratios greater than the lower ratio cutoff and less than the upper ratio cutoff as shown at step 436. Thus, if the ratio of 

ratios for A:A. C:C and G:G are all greater than the lower ratio cutoff and less than the upper ratio cutoff, the unknown 

base would be assigned the code B (meaning "not A'^. This is because the ratios of ratios are complementary to their 

respective base as follows: 
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A:A-> T 
C:C -> G 
G:G -> C 



10 



15 



20 



ratios of ratios are as follows: 

A:A -> 1.1 
C:C -> 4.3 
Q:G ->1.0 
T:T -> 0.4 



25 



30 



35 



40 



45 



50 



55 



increase over the other "mutation" probes. ^ ^^^i raforcnrp method As in the previous imple- 
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If the base intensity associated with the reference wild-type is noft greater than the background difference cutoff, the 
wild-type sequence would fall to have sufficient intensity as shown at step 506. Othenwise, at step 508 the wild-type 
sequence would pass by having sufficient intensity 

At step 510 calculations are performed on the background subtracted base intensities of the reference sequence 
5 in order to "normalize" the intensities. Each position in the reference sequence has four background subtracted base 
intensities associated with rt. The ratio of the intensity of each base to the sum of the intensities of the possible bases 
(all four) is calculated, resulting in four ratios, one for each base as shown in the data table. Thus, the following ratios 
would be calculated at each position in the reference sequence: 

10 A ratio = A/(A + C + G + T) 

C ratio = C/(A + C + G + T) 

G ratio = G/(A+C + G + T) 

15 

Tratio = T/(A + C + G + T) 

At position 241 , A ratio would be the wild-type ratio. These ratios are generally calculated in order to "normalize" the 
intensity data as the photon counts may vary widely from experiment to experiment. Thus, the ratios provide a way of 

20 reconciling the intensity variations aaoss experiments. Preferably if the photon counts do not vary widely from experi- 
ment to experiment, the probe intensities do not need to be "normalized." 

At step 512 the highest base intensity associated with the sanple sequence is checked to see if it has sufficient 
irrtensity to call the unknown base. The intensity is checked by determining if the highest intensity sanple base is greater 
than the background difference cutoff. If the intensity is not greater than the background difference cutoff, the sample 

25 sequence fails to have sufficient intensrty as shown at st^ 514. OthenMse, at step 51 6 the sample sequence passes 
by having sufficient intensity. 

At step 518 calculations are performed on the background subtracted base intensities of the sample sequence in 
order to "normalize" the intensities. Each position in the sample sequence has four background subtracted t>ase inten- 
sities associated with it. The ratios of the intensity of each base to the sum of the intensities of the possible bases (all 
30 four) are calculated, resulting in four ratios, one for each base as shown in the data table. 

At step 520 if either the reference or sample sequences failed to have sufficient intensity, the unknown base is 
assigned the code N (insufficient intensity) as shown at step 522. 

At step 524 the normalized base intensities of the reference sequence are subtracted from the normalized base 
intensities of the sample sequence. Thus, at each position the following calculations are performed: 

35 

A Difference = Sample A Ratio - Reference A Ratio 
C Difference = Sample C Ratio • Reference C Ratio 
40 G Difference = Sample G Ratio - Reference G Ratio 

T Difference = Sample T Ratio - Reference T Ratio 

where the reference and sample ratios are calculated at steps 510 and 518, respectively The base differences resulting 

45 from these calculations are shown in the data table. 

At step 526 each position Is checked to see if there is a base difference greater than an upper difference cutoff and 
a base difference lower than a lower difference cutoff. For exanrple, Fig. 11C shows a graph tiie normalized sample 
base intensities minus the normalized reference base intensities. Suppose tiiat the upper difference cutoff is 0.15 and 
the lower difference cutoff is -0.1 5 as shown by the horizontal lines in Fig. 1 1 C. At tiie mutation position (labeled with a 

50 reference 0). tine G difference is 0.28 which is greater than 0.15, the upper difference cutoff. Similarly the A difference 
is -0.32 which is less ttian -0.15. the lower difference cutoff. As there is a base difference above tiie upper difference 
cutoff and a base difference below the lower difference cutoff, there nr^y be mutation at this position. 

If there is neither a base difference above the upper difference cutoff nor a base difference below the lower difference 
cutoff, the base at that position is assigned the code of the reference wild-type base as shown at step 528. 

55 At step 530 tiie ratio of the highest background subtracted base intensity in the sample to the bacl^round subtracted 
reference wikl-type base intensity is calculated. For example, at tiie mutation position 241 in the data table, the highest 
background subtracted base intensrty in tiie sanrtpte is 571 (base G). The background subtracted reference wild-type 
base intensity is 385 (base A). TTie ratio of 571 :385 is calculated and results in 1 .48 as shown in the data table. 
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At step 532 these ratios arecomparedtoaratioataneighboringposrtion. The ra^^^^ 

trom the L for the r* portion, where r = r, . 1 . For «anple J "ton ^^.^^^^^^^^^^ 
at Dosition 242 (which equals 1 .02) is subtracted from the ratio at position 241 (when equals i .jw;. n«. 

'Tme ^*^nls not detected at step 534, the base at that pos«on is assigned the code of the reference wild-type 

been used to compare the reprodudbilrty of experiments in terms of base calling. 



20 



25 



30 



35 



40 



\\f c^fistirjil Method 



Thestattefical method isamethodofcailingbasesinasamplenucleicacW^^^ 
C5AAAA.3' fSEQ ID NO 4) Suppose a reference sequence has the sequence S-AAAACCGAAAA 3 



sequence: 



50 



55 



Reference 


Sample 


3'-TGAC 


3'-GACT 


3'-TGCC 


3'-GCCT 


3'-TQGC 


3'-GGCT 


3'-TGTC 


3'-GTCT 



The "mutation" probes shown 



for the reference sequence may be from only one experiment, the other experiments may 
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have different "mutation" probes, chip wild-types, tiling methods, and the like. Although each fluorescence intensity is 
from a probe, since the probes may be identified by their unique mutation teases, the probe intensities may be identified 
by their respective bases as follows: 

5 



Reference 


Sample 


3'-TGAC -> A 
3'-TGCC -> C 
3'-TGGC ->G 

3'-TGTC -> T 


3-GACT->A 
3*^CCT -> C 
3'-GGCT->G 
3'-GTCT->T 



15 

Thus, base A of the reference sequence will be described as having an intensity which corresponds to the intensity of 

the mutation probe with the mutation base A. The statistical method will now be described as calling the unknown base 

in the sample sequence by using this example. 

Fig. 1 2 illustrates the high level flow of the statistical method. At step 602 the four base intensities associated with 
20 the sample sequence and each of the multiple reference experiments are adjusted by subtracting the background or 

"blank" cell intensity from each base intensity. Preferably, if a base intensity is then less than or equal to zero, the base 

intensity is set equal to a small positive number to prevent division by zero or negative numbers. 

At step 604 the intensities of the reference wild-type bases in the multiple experiments are checked to see if they 

all have sufficient intensity to call the unknown base. The intensities are checked by determining if the intensity of the 
25 reference wild-type base of an experiment is greater than a predetermined background difference cutoff. The vwld-type 

probe shown earlier for the reference sequence is 3'-TGGC. and thus the G base intensity is the wild-type base intensity. 

These steps are analogous to steps in the other two methods described herein. 

If the intensity of any one of the reference wild-type bases is not greater than the background difference cutoff, the 

wild-type experiments fail to have sufficient intensity as shown at step 606. Othenwise, at step 608 the wild-type exper- 
30 iments pass by having sufficient intensity. 

At step 610 calculations are performed on the background subtracted base intensities of each of the reference 

experiments in order to "normalize" the intensities. Each reference experiment has four background subtracted base 

intensities associated with it: one wild-type and three for the other possible bases. In this example, the G base intensity 

is the wild-type, the A, C, and T base intensities bang the "other" intensities. TTie ratios of the intensity of each base to 
35 the sum of the intensities of the possible bases (all four) are calculated, giving one wild-type ratio and three "other" ratios. 

Thus, the following ratios would be calculated: 

Aratio = A/(A+C + G + T) 
40 C ratio = C/(A + C + G + T) 

G ratio = G/(A+C + G + T) 
Tratio = T/(A + C + G + T) 

45 

where G ratio is the wild-type ratio and A, C. and T ratios are the "other" ratios. These four ratios are calculated for each 
reference experiment. Thus if the number of reference experiments is n, there wouW be 4n ratios calculated. These 
ratios are generally calculated in order to "normalize" the intensity data, as the photon counts may vary widely from 
experiment to experiment. However, if the probe intensities do not vary widely from experiment to experiment, the probe 

50 intensities do not need to be "normalized." 

At step 612 statistics are prepared for the ratios calculated for each of the reference experiments. As stated before, 
each reference experiment will be associated with one wild-type ratio and three "other" ratios. The mean and standard 
deviation are calculated for all tine wild-type ratios. The mean and standard deviation are also calculated for each of the 
other ratios, resulting in tiiree other means and standard deviations for each of tiie leases that is not the wild-type base. 

55 Therefore, tiie following would be calculated: 

l^ean and standard deviation of A ratios 
Mean and standard deviation of C ratios 
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Mean and standard deviation of G ratios 
Mean and standard deviation of T ratios 

known collectively as the "other" means and standard deviations. 

Suppose that the preceding calculations produced the following data: 

A ratios -> mean = 0.16 std. dev. = 0.003 
C ratios -> mean = 0.03 std. dev. = 0.002 
G ratios -> mean = 0.71 std. dev. = 0.050 
T ratios -> mean = 0.1 1 std. dev. = 0.004 

.ild-w. aqwimerts. Tl«, results « the p,c»roc»sina 8Bgear.9or«lin.«ssothaltherefer«KecaK».l««»K<» 
sample sequence fails to have sufficient intensity as sho«n at step 616. Otherwise, at step 618 the sample sequence 

30 are calculated: 

Aratlo = A/{A + C + G + T) 



10 



15 



20 



25 



35 



Cratio = C/(A + C + G + T) 
Gratio = G/{A + C + G + T) 
Tratio = T/(A + C + Q + T) 

40 where ratio G is the wild-type ratio and ratios A. C, and T are the "^^^ J^)^^^ . . 

Suppose the background subtracted inlensities associated with the sample are as follows. 

A->310 
C-> 50 
G-> 26 
45 T->100 

Then, the corresponding ratios would be as follows: 

A ratio = 310 / (310 + 50 + 26 + 100) = 0.64 

C ratio = 50 / (310 + 50 + 26 + 100) = 0.10 

G ratio = 26 / (310 + 50 + 26 + 100) = 0.05 

T ratio = 100 / (310 + 50 + 26 + 100) = 0.21 

At Step 622 rt either the reference experiments or the sample sequence failed to have sufficient intensity, the 
unknown base is assiqned the code N (insufficient intensity) as shown at step 624. 

A^er626S and "other" ratios associated with the sample sequence are compared to statistical expres- 
sions ^^^^ 



50 



55 
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Thus, there is a standard deviation cutoff for each of the bases A, C, G. and T. The localized standard deviation cutoffs 
allow the unknown base to be called with higher precision because each standard deviation cutoff can be set to a different 
value. Suppose the standard deviation cutoffs are set as follows: 

A standard deviation cutoff -> 4 

C standard deviation cutoff -> 2 

G standard deviation cutoff -> 8 

T standard deviation cutoff 4 

The wild-type base ratio associated with the sample is connpared to a corresponding statistical expression: 

WT ratio a WT mean - (WT std. dev. * WT base std. dev. cutoff) 

where the WT base std. dev. cutoff is the standard deviation cutoff for the wild-type base. As the wild-type base Is G, 
the above comparison solves to the following: 

0.05 a 0.71 - (0.050 * 8) 

0.05 £ 0.31 

which is not a true expression (0.05 is not greater than 0.31). 

Each of the "other" ratios associated with the sample Is compared to a corresponding statistical expression: 

Other ratio > Other mean + (Other std. dev. * Other base std. dev. cutoff) 

where the Other base std. dev. cutoff is the standard deviation cutoff for the particular "other base. Thus, the above 
comparison solves to the following three expressions: 

A -> 0.64 > 0.16 + (0.003* 4) 

0.64 > 0.17 
C-> 0.10 > 0.03 + (0.002 * 2) 

0.10 > 0.03 
T-> 0.21 > 0.11 +(0.004*4) 

0.21 >0.13 

which are all true expressions. 

At step 628 if only the wild-type ratio of the sample sequence was greater than the statistical expression, the unknown 
base is assigned the code of the reference wild-type base as shown at step 630. 

At step 632 if one or more of the "other" ratios of the sanple sequence were greater than their respective statistical 
expressions, the unknown base is assigned an ambiguity code that indicates the unknown base may be any one of the 
complements of these bases, including the reference wild-type. In this example, the "other" ratios for A, C, and T were 
all greater than their con^esponding statistical expression. Thus, the unknown base would be called the complements 
of these bases, represented by the subset T, G, and A. Thus, the unknown base would be assigned the code D (meaning 
"not C"). 

If none of the ratios are greater than their respective statistical expressions, the unknown base is assigned the code 
X (insufficient discrimination) as shown at step 636. 

The statistical method provides accurate base calling because It utilizes statistical data from multiple reference 
experiments to call the unknown base. The statistical method has also been used to implement confidence estimates 
and calling of mixed sequences. 
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V. Pooling Processing 

The oresent invention provides pooling processing which is a method of processing reference and sample nucleic 
acidlSuilSt^eSSeva^^^^^^^^ 

SfertrrSeSTarriU-Tple nucleic add sequences are labeled with different fluorescent markers em-tlmg light ^ 
SZnl >^1Ss S^^^^ *e nucleic acids may be labeled with other types of markers including d>st.ngu,shable 

'^''"/S'ttiT^S^nce and sample nucleic acid sequences are labded wfth different cc^or fluorescent markers, the 
laber^^I^nt a^ s^f^e mideic acW sequences are then combined and processed together. An apparaUis for 
dte^nrte-^^iSedwiSd^erentmarkersispr^^^ 

1r^3 M^TfSling processing of a reference and sample nudeic acid sequence. At step 702 a reference 
nud^cacfdSTn^s-^^^ 

SSdyrthat. upon excitation, emrts light of a different wavelength than that of the fluor^ent dye of me 
ire^SsZenS Fbf ex^nple. the sample nuclei ackl sequence may be marked «rith rhodam.ne. A«ernat^ely the 
SSerdTa^'sequence'may be mZd by attaching W 

to Septavidin labeled with phycoerythrin. Of course, either sequence may be marked wrth these or other dyes or other 
S-ST^r^re (e7 radioactive) as long as the other sequence is marked with a marker that .s distnguishable. 

AtCyte^hJiLS^S 

ing .SrSnSi in the same manner as lor only one labeled sequence. At step 708 the sequence are f-agmented. The 
raqmented nudeic add sequences are then hybridized on a chip containing probes as shown at step 710 _ 

T st« 71 2 a scannergenerates image files that indicate the locations where the labeled nucleic acids bound to 
the c?ipSereteti?callysJme overlap bL^eentt^ 

irrerton^^e daS^fS^ corr^^ and "samf^e." In general, the scanner generates an -mage fje by 

S «Son light on the h^rWized chip and detecting the fluorescent light that is emrtled. The ^rter e^ng 
TeroreS ligW can be ideJed by the waveleng^ 

is about 530 nm while that of a typical rhodamine dye IS about 580 nm. ^ . ^. « 

^eSinner creates an image file for the data associated with each fluorescent marker, indicating the loc^ons 
whe^me^ZSytabeleSnudeicacidboundtothechip. Based upon an an^^^^ 
rndSoTi^mLpossibletoe^ctractinformationsudiasthemonomerse^^ 

Sg p^iS^e^ucesvariafions across individual experim^ 
mon JZugh^ng processing has been desaibed as being used to improve the combined processing of referer«e 
TsZ^^nud'Sacid^equences.theprocessmayalsobeusedlortwore^^^^^^ 

nud^^cTd s^uTce'Ze methods are highly accurate in klentifying single nruitations. '"•^-^^^^'^P'^J.^^^^^ 
and rerSig false positives for mutations, where a false positive is a base that has ^''0"^ 
S^oTmese rS^ods utilize hybridization data from more than one base position to identity the likely p<».ton of 
^^ io^s^lt-TeSon position on the probes is utilized to more accurately identity likety mutations which makes 
:::S1;SemUe^^^^^^ caTng methods. ^These methods may be ad>«ntageously fon^^^^^^^^^ the base calling 
methods described herein to efficiently and accurately sequence a sample nucleic acid sequence. 

TdiSSd earlier in reference to Fig. 8, the fluorescent intensities of cells near an interrogation position having 
a mut^SnS^ iXeL S^rw^ich creates "dark regions" around the mutation. TT^ese lower fluorescent intensites 

ferftT^iietheS^^ 
tolSrsSres^ueni^^^^^ 

of thsKiP "dark reaions" may be utilized to identify mutations and false positives. 

S?^ll?a 2nS2^siuence and a refer^^^^ 
case^ SSSL a^bS^coerythrin. The sample and reference sequences are known and the sample sequence 
raeiSSTo Se^eren^ sequence except for mutations at certain known positions. The sample and refenence 

iJue^Swerethen^^^^ 
S^tTrprci'^^ild-type^^esthatar^ 

20-mer probes wrth the interrogation position of each probe being at the 12* base position in the probe^ 
20 mer probes Wrth me m^^ t^ 

thesaU^S^en'oTsequrn^^^^ 

on theThr?he photon counts of the probes in the wild-type cells are plotted on a logarrthmic scale °n 0" /^ ^^^^^ 
?he phS coun^ range from 1 (representing a de minimus value) and 100.000. The photon counts for the probes in 

coul^^ wildC^L nUer^ 1 1 ■ 2*- ^ " "^^ """" '™ 
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probes in these cells. The cells are left "blank" in order to minimize diffraction edges and thus, the location of these blank 
cells Is known. Consequently, the intermittent wild-type cells that have a photon count of 1 do not represent erroneous 
data. 

As shown in Fig. 14A, the scaled photon counts for the wild-type probes hybridizing with the sample and reference 
5 sequences are almost the same except for two "bubbles." A bubble 730 has a top curve defined by the photon counts 
of the wild-type probes that hybridized with the reference sequence and a bottom curve defined by the photon counts 
of the wild-type probes that hybridized with the sample sequence. Following bubble 730. there is a section 732 where 
the photon counts for the wild-type probes hybridizing with the sample and reference sequences are almost the same. 
After section 732 is another bubble 734 which again has a top curve defined by the hybridization of the reference 
10 sequence and the bottom curve defined by the hybridization of the sample sequence. Another partial bubble is shown 
to the right of bubble 734. 

Each bubble in Fig. 1 4A corresponds to a dark region surrounding a single mutation. Because the wiki-type probes 
at and surrounding a mutant position in the sanple sequence contain a single base mismatch with the sample sequence, 
the hybridization is relatively tower which results in lower photon counts. Much information about the sample sequence 

IS may be acquired by a detailed analysis of these bubble regions. 

The width of the bubble indicates whether there is a false positive, a single mutation or a multiple mutation. If there 
is a single mutation, the width of the bubble should be approximately equal to the probe length. For example. Fig. 14A 
was produced utilizing 20-mer probes. Accordingly, bubbles 730 and 734 are approximately 20 wikd-type cells wide 
indicating that the both these bubbles were produced by single mutations. The width of the dark region resulting from a 

20 single mutation is believed to be approximately equal to the probe length because each of the probes in this region have 
a single base mismatch with the sanple sequence. 

If the width of the bubble is substantially less than the probe length, the bubble may represent a false positive. For 
example, assume that at wild-type cell number 45 in Fig. 14A, the hybridization of the wild-type probe with the sample 
sequence was very low (e.g., around 1000 photon counts). A base calling algorithm that calls the bases according to 

25 the intensities among the cells at that position may indicate that there is a mutation at this position. However, the low 
photon counts may be due to dust on the chip and not due to lower hybridization. Since the width of this bubble would 
be 1 . which is substantially lower than the probe width of 20, the lower photon count at wild-type cell 45 would not be 
due to a mutation (i.e., there is no dark region surrounding that position). 

If the width of the bubble is substantially nrore than the probe length, the bubble may represent multiple mutations. 

30 In other wads, the bubble may be produced by nrwre than one overlapping dark region. The analysis of such a bubble 
will be discussed in more detail in reference to Fig. 14C. 

Returning to Fig. 1 4A, each of bubbles 730 and 734 are approximately 20 bases wide indicating with a high degree 
of certainty that each of the bubbles represent a single mutation. Furthermore, the bubbles may be analyzed to determine 
the probable location of the mutations within the bubbles. As mentioned earlier, the 20-mer probes on the chip had an 

35 interrogation position at the 12^^ base position in the probe. Thus, the base at the 12*^ base position is the base that 
varies among the related WT-, A-, C-, G- and T-cells. Accordingly, the mutation should be located at the 12*^ position in 
the bubble. 

The actual mutation in bubble 730 occurs at the 12th position (from the left). Additonally, the actual mutation in 
bubble 734 occurs at the 12th position (from the left). Thus, as the graph shows, there are 1 1 bases to the left of each 
40 mutation and 8 bases to the right of each mutation. By utilizing the location of the interrogation position within the probes, 
the present invention can help to identify the probable location of a mutation within a dark region or bubble. 

Additionally, because this method identifies specific locations that may have a mutation, more efficient base calling 
may be achieved. For example, an analysis of bubble 730 indicates that there is likely to be a single mutation around 
wild-type cell 15. Typically, most en-ors in base calling occur in the dark regions surrounding a mutation. Many false 
45 positives in this dark zone can now be eliminated because they are incompatible with the bubble size (which indicates 
single mutation, for example). Also, by identifying clearly a "mismatch zone," we can now apply algorithms tiiat factor in 
the effect of a mismatch or multiple mismatches. 

Additionally, the shape of the bubble may indicate what mutation has occurred. Fig. 1 4B shows a hypotfietical graph 
of the fluorescent intensities vs. cell locations for wild-type probes hybridizing with two sanple sequences and one 
50 reference sequence. A C-A mismatch will be more destabilizing to probe hybridization than a U-G mismatch. As shown, 
the more destabilizing C-A mismatch results in a larger volume bubble. The shape of the bubble may be utilized to identify 
the particular mutation by pattern matching bubbles stored in a library. 

Fig. 14C shows a graph of the fluorescent intensities (photon counts) of tiie wild-type probes hybridizing with tfie 
sample and reference sequences. A single bubble 750 is flanked on either side by regions 752 and 754 which do not 
55 contain a mutation. The graph was produced from a chip containing 20-mer probes witii an interrogation position at base 
12 on the probes. 

As shown, bubble 750 is 27 bases wide indicating that the bubble was produced from the dark regions sunroundlng 
more tiian one mutation as 27 is greater than 20 or the length of the probes. In addition to providing information tiiat 
there are multiple mutations, analysis of the bubble indicates the probable position of two of the mutations. Because tiie 
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right of the reference sequence that is the chip wild-type for the current analysis. Although the chip wild-type sequence 
has associated fluorescence intensities, the other reference sequences shown below the chip wild-type may be known 
sequences that have not been tiled on the chip. These may or may not have associated fluorescence intensities. The 
reference sequences other than the chip wild-type are used for sequence comparisons and may be in the form of simple 
5 ASCII text files. 

Sample sequence area 816 is where sample or unknown experimental sequences are displayed for comparison 
with the reference sequences. The sample sequence area is divided into a sample name subarea 824 and sample base 
subarea 826. The sample name subarea is shown with filenames that contain the sample sequences. The filename 
extensions indicate the method used to call the sample sequence where ".cqr denotes the intensity ratio method, ".rqr 

10 denotes the reference method, and ".sq*" denotes the statistical method (# indicates the unit on the chip). The sample 
base subarea contains the bases of the sannple sequences. The bases of the sample sequences are identified by the 
codes previously set forth which, for the most part, conform to the lUPAC standard. 

Window 802 also contains a message panel 828. When the user selects a base with an input device in the reference 
or sample base subarea. the base becomes highlighted and the pathname of the file containing the base is displayed 

75 in the message panel. The k>ase's position in the nucleic add sequence is also displayed in the message panel. 

In pull down menu File 804, the user is able to load files of experimental sequences that have been tiled and scanned 
on a chip. There is a chip wild-type associated with each experimental sequence. The chip wiW-type associated with 
the first experimental sequence loaded is read and shown as the chip wild-type in reference sequence area 814. The 
user is also able to load files of known nucleic add sequences as reference sequences for comparison purposes. As 

20 before, these known reference sequences may or may not have associated probe intensity data. Additionally, in this 
menu the user is able to save sequences that are selected on the screen into a project file that can be loaded in at a 
later time. The project file also contains any linkage of the sequences, where sequences are linked for comparison 
purposes. Sequences to be saved, both reference and sample, are chosen by selecting the sequence filename with an 
input device in the reference or sample name subareas. 

25 In pull down menu Edit 806, the user is able to link together sequences in the reference and sample sequence 
areas. After the user has selected one reference and one or more sample sequences, the sample sequences can be 
linked to the reference sequence by selecting an entry in the pull down menu. Once the sequences are linked, a link 
number 830 is displayed next to each of sequences of related interest. Each group of linked sequences is assodated 
with a unique link number, so the user can easily identify which sequences are linked together. Linking sequences 

30 permits the user to more easily compare the linked sequences. The user is also able to remove and display links from 
this menu. 

In pull down menu View 808, the user is able to display intensity graphs for selected bases. Once a base is selected 
in the reference or sample base subareas, the user may request an intensity graph showing the hybridized probe inten- 
sities of the selected base and a delineated neighborhood of bases near the selected base. Intensity graphs may be 

35 displayed for one or multiple selected bases. The user is also able to prepare comment files and reports in this menu. 
Fig. 1 7 illustrates an intensity graph window for a selected base at position 1 20 (SEQ ID NO:30 and SEQ ID N0:31 ). 
The filename containing the sequence data is displayed at 904. The graph shows the intensities for each of the hybridized 
probes associated with a base. Each grouping of four vertical bars on the graph, which are labeled as "a", "c". "g". and 
T on line 906, shows the background subtracted intensities of probes having the indicated substitution base. In one 

40 embodiment, the called bases are shown in red. The wild-type base is shown at line 908. the called base is shown at 
line 910, and the base position is shown at line 912. In Rg. 17, the base selected is at position 120, as shown by arrow 
91 4. The wild-type base at this position is T; however, the called base is M which means the base is either A or C (amino). 
The user is able to use intensity graphs to visually connpare the intensities of each of the possible calls. 

Fig. 18 illustrates multiple intensity graph windows for selected bases (SEQ ID NO:32, SEQ ID NO:33. SEQ ID 

45 NO:34, and SEQ ID NO:35). There are three intensity graph windows 1002, 1004, and 1006 as shown. Each window 
may be associated with a different experiment, where the sequence analyzed in the experiment may be either a reference 
(if it has associated probe intensity data as in the chip wild-type) or a sample sequence. The windows are aligned and 
a rectangular box 1008 shows the selected bases' position in each of the sequences (position 162 in Fig. 18). The 
rectangular box aids the user in identifying the selected bases. 

50 Referring again to Fig. 1 6, in pull down menu Highlight 81 0, the user is able to compare the sequences of references 
and sannples. At least four comparisons are available to the user, including the following: sample sequences to the chip 
wild-type sequence, sample sequences to any reference sequences, sanrple sequences to any linked reference 
sequences, and reference sequences to the chip wild-type sequence. For example, after the user has linked a reference 
and sanrple sequence, the user can compare the bases in the linked sequences. Bases in the sample sequence that 

55 are different from the reference sequence will then be indicated on the display device to the user (e.g.. base is shown 
inadifferent color). In another example, the user is able to perform a comparison that will help identify sample sequences. 
After a sample is linked to multiple reference sequences, each base in the sample sequence that does not match the 
wild-type sequence is checked to see if it matches one of the linked reference sequences. The bases that match a linked 



23 



10 



EP0 717113A2 

ID NO:10. SEQ ID N0:11. SEQ ID N0:12, SEQ ID NOIS '° J?/'*: ,104 g mutant sequence 1106. The 
N0:17. and SEQ ID N0:18). A window 1 102 -s shown a ch^.^J^^ 1° reSgular box 1V08. The cWp wlld- 
.utam sequence dHfersfron, the c^pv^^^^^^^^^ 

type and mutant sequences are a region of HIV Pol Gene spanning sequences are 

There are seven sample sequences that are f '"'^^^^^ sequenTe. "nius. there are 

actually solutions « proportions 0 ^e ^ J ^^^^-^^^^^^ 

^^SthtSe;^^^^^^ 

of the sample solutions: 



15 





Sample Solution 


Chip Wild-Type:Mutant 




1110 


100:0 


20 


1112 


90:10 




1114 


75:25 




1116 


50:50 


25 


1118 


25:75 




1120 


10:90 




1122 


0:100 



30 



35 



40 



45 



50 



55 



For example, sample solutkH^IIUcon^ns 75% d«pw^^^^^^^^^ 
Nowreferringtothebasescalledinrectangularboxl^^^^^ 

^•r;s^ir.«rs::re"^^^^^ 
^-I\^s:ns^rrc^s^t^^^^^ 

ambiguKy lUPAC code denoting A or G (punne)^.s a^ a c^ e^^^^^ 
75% to 50% chip-wild type sequence and from 25% to 50% mutation sequence. 

calls the base in this transition stale. .u^intoncHu ratio method as having a mutation base Gat the 

Sample solutions1118J120,and1122 are calledby*.^^^^^ 

of more than one nucleic acid sequence. intensity ratio method incorrectly 

Fig. 20 illustrates the reference method correctiy calling a \here are three intensity 

called the mutant base (SEQ ID NO:36. SEQ ID ^O^^^-^^^Q ^a^^^^^^^ box 1208 outlines the 

graph windows 1202, 1204, and 1206 as shown. The However, the base 

SaTe« of interest. Window 1202 shows « ^"^.^ ^^^^^^ that posKlon. The intensity 

StrmSSr^j^rr^^^^^^^^^^ 

^^^rnr'inrareferencesequen.^^ 



24 



EP 0 717 113 A2 



VII. Examples 
Example 1 

5 The intensity ratio method was used in sequence analysis of various polymorphic HIV-1 clones using a protease 
chip. Single stranded DNA of a 382 nt region was used with 4 different clones (HXB2, SF2, NY5, pPol4mut18). Results 
were compared to results from an ABI sequencer. The results are illustrated below; 

10 
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SUMMARY 
ABI (sense) - 99.5% 
Chip (sense) - 98.1% 
ABI (antisense) - 98.6% 
Chip (antisense) - 99.1% 
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Example 2 

30 

HIV protease genotyping was performed using the described chips and CallSeq™ intensity ratio calculations. Sam- 
ples were evaluated from AIDS patients before and after ddl treatment. Results were confirmed with ABI sequencing. 

Fig. 21 illustrates the output of the ViewSeq™ program with four pretreatment samples and four posttreatment sam- 
ples (SEQ ID NO:22. SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25. SEQ ID NO:26, and SEQ ID N0:27). Note the 
35 base change at position 207 where a mutation has arisen. Even adjacent two additional mutations (gt), the ^'a" mutation 
has been properly detected. 

The above desaiption is illustrative and not restrictive. Many variations of the invention will beicome apparent to 
those of skill in the art upon review of this disclosure. Merely by way of example, while the invention is Illustrated with 
particular reference to the evaluation of DNA (natural or unnatural), the methods can be used in the analysis from chips 
40 with other materials synthesized thereon, such as RNA. The scope of the invention should, therefore, be determined 
not with reference to the above description, but instead should be determined with reference to the app^ded claims 
along with their full scope of equivalents. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(B) STREET: De RuyderJcade 62 

(C) CITY: Curacao 

(E) COUNTRY: Netherlands Antilles 

(F) POSTAL CODE (ZIP) : none 

(U) TITLE OF INVENTION: Computer-Aided Visualization and 
Analysis System for Sequence Evaluation 

(iii) NUMBER OF SEQUENCES: 39 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy ^^sk 
B) COMPUTER: IBM PC compatible 

Ic) OPERATING f f i?^?, Version #1.25 (EPO) 

(D) SOFTWARE: PatentIn Release ffx-^, 



(2) INFORMATION FOR SEQ ID N0:1: 

fi) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 
^(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 
ATGTGGACAG TTGTA 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:2: 
ATGTGGATAG TTGTA 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 
ATGTGGAKAG TTGTA 
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(2) INFOBMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID 
AAAACTGAAA A 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID 
AAAACCGAAA A 
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(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
AAACCCAATC CACATCA 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
AAACCCAGTC CACATCA 
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(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GGGGAAGCAG ATTTGGGTAC CACCCAAGTA T 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
GGGGAAGCAG ATTTGAAMAC CACCCAAGTA T 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GCATTAGTAG AGATATGTAC AGAAATGGAA AAGGAAGGGA AAATTTCAAA 
AATTGGGCC 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

GCATTAGTAG AAATTTGTAC AGAGATGGAA AAGGAAGGGA AAATTTCAAA 
AATTGGGCC 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

GCATTAGTAG AGATATGGAG AGRARDGGRA AXXXAAGGGA AAATTNNNAA 
AATTGGGCC 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

GCATTAGTAG AGATATGKAS AGRARDGGRA AXXXAAGGGA AAAKTNNNAA 
AATTGGGCC 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GCATTAGTAG AGATATGKAS AGRRRDGGRA AXXXAAGGGA AAADTYNNAA 
AATTGGGCC 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
'(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

GCATTAGTAG AGATATGTAS AGRRADGGAA AXGGAAGGGA AAATTNNNNA 
AATTGGGCC 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

GCATTAGTAG AGATATGTAC AGRGAGGGAA AXGGAAGGGA AAATTNNNNA 
AATTGGGCC 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

GCATTAGTAG AGATATGTAS AGRGAGGGAA AXGGAAGGGA AAATTNNNNA 
AATTGGGCC 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

GCATTAGTAG GAGGNNNGAC AGGGRKGGAA AXXMAAGGGA AAAKTNNNAA 
AATTGGGCC 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

TCGAGATAAT CTATGTCCTC GTCTACTATG TCATAATCTT CTTTACTTAA 
ACGGTCCTTT 

TACCTTTGGT TTTTACTATC CCCCTTAACC TCCAAAATAG TTTCATTCTG 
TCATGCTAGT 

CTATGGACAT CTTTAGACAC CTGTATTTCG ATATCCATGT 
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(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 150 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

NNGAGATANN NTATGTCCTC GTCYACTATG TNANNNNNNN NNNNNNNNAA 
ACGGTCCTNN 

,0 NNNNNNNNNN OTNNNNNN CNNCNTAACC TCCAAAATAN NNNNNNTCTN 
NNNNANNNNT 
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CTANNNGNAG NNNNAGANAR NCCNNNNNNN NNATNCATGT 



160 



30 (2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 base pairs 

(B) TYPE: nucleic acid 

35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

TCGAGATAAT CTATGTCCTC GTCTACTATG TCATAATNNN NNNNACTTAA 
ACGGTCCTTT 

TACCTTTGGT TTTTACTATC CCCCTTAACC TCCAAAATAG TTTCATTCTG 
NCATANNAGT 



CTATGNGNNG NNNTAGACAG NCCNNNNTCG ATATCCATGT 



160 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TCGAGATAAT CTATGTCCTC GTCTACTATG TCATAATCTT CTTTACTTAA 
ACGGTCCTTT 

TACCTTTGGT TTTTACTATC CNNCTTAACC TCCAAAATAG TTTCATTCTG 
TCATACTAGT 

CTATGGGTAG CTTTAGACCN CCGTATTTCG ATATCCATGT 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

TCGAGATAAT CTATGTCCTC GTCTACTATG TCATAATCTT CTTTACTTAA 
ACGGTCCTTT 

TACCTTTGGT TTTTACTATC CCNCTTAACC TCCAAAATAG TTTCATTCTG 
TCATACTAGT 

CTATGGGTAG CTTTAGACCC CCGTATTTCG ATATCCATGT 
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(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
NCGGGATANT NTATGTCCTC GTCYACTATG TCANNNNNCN NNCNNNNCAA 
ACGGTCCNCC 

^ NNNNNCNNNN KNCNNCYANG AANCYCAACC TCCAAAATAN NNNNNNTCTN 
NNNNANNNCN 

CTNNNNNNAG NGNNAGACAC CTGTATNNNN NTATNCAYGT 
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30 (2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 base pairs 

(B) TYPE: nucleic acid 

35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



40 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

TCGRGATAAT CTATGTCCTC GTCTACTATG TCATAATCCN NNCNNCTCAA 
45 ACGGTCCTYC 

CNNNNYTGGT TNYTACTATC CCCCTTAACC TCCAAAATAG TTTCATTCTG 
NCATACNNST 

CTANNNNNAG NGTTAGACAC CTGTATTTCG ATATCCATGT 160 

so 
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(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

TCGAGATAAT CTATGTCCTC GTCTACTATG TCATAATCCN NCCTACTCAA 
ACGGTCCTTC 

TACCTTTGGT TTTTACTATC CMCCTTAACC TCCAAAATAG TTTCATTCTG 
TCATACTAGT 

CTATGAGTAG CTTTAGACAC CTGTATTTCG ATATCCATGT 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

TCGAGATAAT CTATGTCCTC GTCTACTATG TCATAATCTT CTTTACYCAA 
ACGGTCCTXC 

TACCTTTGGT TTTTACTATC CCMCTTAACC TCCAAAATAG TTTCATTCTG 
TCATACTAGT 

CTATGAGTAG CTTTAGACAC CTGTATTTCG ATATCCATGT 
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(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
AAACCCAATC CACATCM 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
MMACNCANNC CACANNM 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
TTGGGTACCA C 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
TTGAAMACCA C 
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(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
ACAGAAATGG A 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: -DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AGAGRATDGG R 
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(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
ASAGRRADGG A 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
ACAGGGRRGG A 
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(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 
CTGGGGGGTA T 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CTGGCCSGTG T 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 
CTGGGCGGTA T 
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20 



(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (oligonucleotide) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CTGGCACGTG T 11 



Claims 

25 

1. In a computer system, a method of identifying an unknown base in a sample nudeic add sequence, said method 
comprising the steps of: 

inputting a plurality of probe intensities, each of said probe intensities being associated with a nucleic acid 

probe; 

30 said computer system conparing said plurality of probe intensities wherein each of said plurality of probe 

intensities is substantially proportional to said associated nucleic acid probe hybridizing with at least one nucleic 
acid sequence, said at least one nucleic acid sequence including said sample sequence; and 
calling said unknown base according to results of said comparing step. 

35 2. In a computer system, a method of identifying an unknown base in a sample nudeic add sequence, said method 
comprising the steps of: 

inputting a plurality of probe intensities, each of said probe intensities being associated with a nucleic acid 

probe; 

said computer system comparing said plurality of probe intensities wherein each of said plurality of probe 
40 intensities is substantially proportional to said associated nucleic acid probe hybridizing with said sample sequence; 
and 

calling said unknown base according to results of said comparing step. 

3. The method of claim 2, wherein said comparing step includes the step of said computer system calculating a ratio 
45 of a higher probe intensity to a lower probe intensity. 

4. The method of daim 3. wherein said calling step includes the step of calling said unknown base according to said 
probe assodated with said higher probe intensity if said ratio is greater than a predetermined ratio value. 

50 5. The method of claim 4, wherein said predetermined ratio value is approximately 1 .2. 

6. In a computer system, a method of identifying an unknown base in a sample nudeic add sequence, said method 
comprising the steps of: 

inputting a first set of probe intensities, each of said probe intensities in said first set being associated with 
55 a nucleic add probe and substantially proportional to said associated nucleic acid probe hybridizing with a reference 
nucleic add sequence; 

inputting a second set of probe intensities, each of said probe intensities in said second set being associated 
with a nucleic acid probe and substantially proportional to said associated nucleic acid probe hybridizing with said 
sample sequence; 
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said computer system comparing at least one of said probe intensities in sakl first set and at least one of 
said probe intensities in said second set; and 

calling said unknown base according to results of said comparing step. 

each probe intensity of a probe tiybridizing with said sample sequence. 

8. The method of daim 7, wherein said comparing step further includes the step of calculating thiKl ratios of said first 
ratios to said second ratios. 

9. The method of daim 8. wherein said calling step includes the step of calling said unknown base according to said 

15 probe associated with a highest third ratio. 

10 The method of claim 6. wherein said comparing step includes the step of calcuteting a ratio of a highest prcbe 
intensity in said first set to a highest intensity in said second set. 
20 ii.ThemethodofclaimlO.whereinsaklcomparingstepfurtherindudestheslepofcomparingsaid^^ 

nucleic acid probes. 

12 in a computer system, a method of klerrtifying an unknown base in a sample nudeic add sequence. saW method 

computer system comparing at least one of said plurality of probe intensities «nth said statistics; and 
calling said unknown base according to results of said comparing step. 

13. The method of claim 12. further comprising the step of calculating said statistics. 

14. The method of claim 12. wherein said statistics indude a mean and standard de/iation. 

15. A method of processing first and second nudeic acid sequences, comprising the steps of: 
providing a plurality of nudeic acid probes; 
labeling said first nudeic acid sequence with a first marker; 
labeling saw second nucleic acid sequence wfth a second marker; and 

hybridizing said first and second labeled nudeic acid sequences at the same time. 

16. The method of daim 15, wherein said plurality of nudeic acid probes are on a diip. 

17. The method of claim 15. further comprising the step of fragmenting said first and second nucleic acid sequences 

at the same time. 

18 Themethodofdaim15,furtherconprisingthestepofscanningforsaklfirstandsecondmarkeraonsaidchip.«^^ 

first and second labeled nudeic acid sequences being on said chip. 
1 9. The method of claim 1 5. wherein said first and second markers are fluorescent markers that emit light at drfferent 
wavelengths upon excitation. 
« 20.1nacomputersystem,amethodofidentifyingmutationsinasamplenucleicacidsequence.^^^ 

''Tr?>lg a first set of probe intensities, each of said probe intensities in saW fi^ set bej^ :^^;^;^*e 
anudJacid probe and substanfially proportional to said assoc^tednudeicacid probe h^^^^^^ 

nucleic acid sequence; 
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inputting a second set of probe intensities, each of said probe intensities in said second set being associated 
with a nucleic acid probe and substantially proportional to said associated nucleic acid probe hybridizing with said 
sample sequence; 

said computer system comparing probe intensities in said first set and probe intensities in said second set 
5 to select hybridization regions where said probe intensities in said first set and said probe intensities in said second 
set differ; and 

identifying mutations according to characteristics of said selected regions. 

21 . The method of claim 20, wherein said selected regions are determined by comparing probe intensities of wild-type 
10 probes. 

22. The method of daim 21 , wherein said wild-type probes are complementary to a portion of said reference sequence. 

23. The method of claim 21 . wherein said identifying step furtiier includes tiie steps of: 
15 analyzing a size of a selected region; 

identifying a likely position of a mutation in said selected region according to an intenogatton position of said 
nucleic add probes; and 

performing base calling at said likely position. 

20 24. In a computer system, a metiiod of analyzing a plurality of sequences of bases, said plurality of sequences including 
at least one reference sequence and at least one sample sequence, tiie method comprising tiie steps of: 
displaying said at least one reference sequence in a first area on a display device; and 
displaying said at least one sample sequence in a second area on said display device; 
whereby a user is capable of visually comparing said plurality of sequences. 

25 

25. The method of daim 24. wherein said plurality of sequences are monomer strands of DNA or RNA. 

26. The metiiod of daim 24. wherein said at least one reference sequence indudes a chip wild-type that has been tiled 
on a chip. 

30 

27. The method of claim 26, wherein said chip wild-type sequence is displayed as a first sequence in said first area. 

28. The method of daim 26. further comprising the step of displaying a label in said first area to identify said chip wild- 
type sequence. 

35 

29. The method of claim 24. wherein said at least one sample sequence has been hybridized on a chip. 

30. The method of claim 24, further comprising the step of indicating bases that differ among a plurality of user selected 
sequences. 

40 

31. The method of claim 24, further comprising the steps of: 

displaying a name associated with each of said at least one reference sequence in said first area; and 
displaying a name associated with each of said at least one sample sequence in said second area. 

45 32. The method of claim 24, furtiier comprising tiie step of linking at least one reference sequence in said first area with 
at least one sample sequence in said second area. 

33. The metiiod of claim 32, furtiier comprising the step of indicating on said display device which sequences are linked. 

50 34. The metiiod of daim 24, further comprising tiie step of indicating bases of said at least one sample sequence tiiat 
are not equal to a con-esponding base in said at least one reference sequence. 

35. The method of claim 24, wherein said at least one reference sequence and said at least one sample sequence are 
aligned on said display device, hybridization with said probes. 
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